207 research outputs found

    Automatic detection of borrowing (Open problems in computational diversity linguistics 2)

    Get PDF
    This is the third of a series of 12 blog posts published in 2019, discussing open problems in computational diversity linguistics. It discusses the problem of automatic borrowing detection

    Automatic sound law induction (Open problems in computational diversity linguistics 3)

    Get PDF
    This is the fourth of a series of 12 blog posts published in 2019, discussing open problems in computational diversity linguistics. It discusses the problem of automatic sound law induction

    Bangime: secret language, language isolate, or language island? A computer‐assisted case study

    Get PDF
    We report the results of a qualitative and quantitative lexical comparison between Bangime and neighboring languages. Our results indicate that the status of the language as an isolate remains viable, and that Bangime speakers have had different levels of language contact with other Malian populations at various points throughout their history. Bangime speakers, the Bangande, claim Dogon ancestry. The Bangande portray this connection to Dogon through the fact that the language has both recent borrowings from neighboring Dogon varieties and more rooted vocabulary from Dogon languages spoken to the east from whence the Bangande claim to have come. Evidence of multilayered long‐term contact is clear: lexical items have even permeated even core vocabulary. However, strikingly, the Bangande are seemingly unaware that their language is not intelligible with any Dogon variety. We hope that our fiindings will influence future studies on the reconstruction of the Dogon languages and other neighboring language varieties to shed light on the mysterious history of Bangime and its speakers

    Trimming Phonetic Alignments Improves the Inference of Sound Correspondence Patterns from Multilingual Wordlists

    Full text link
    Sound correspondence patterns form the basis of cognate detection and phonological reconstruction in historical language comparison. Methods for the automatic inference of correspondence patterns from phonetically aligned cognate sets have been proposed, but their application to multilingual wordlists requires extremely well annotated datasets. Since annotation is tedious and time consuming, it would be desirable to find ways to improve aligned cognate data automatically. Taking inspiration from trimming techniques in evolutionary biology, which improve alignments by excluding problematic sites, we propose a workflow that trims phonetic alignments in comparative linguistics prior to the inference of correspondence patterns. Testing these techniques on a large standardized collection of ten datasets with expert annotations from different language families, we find that the best trimming technique substantially improves the overall consistency of the alignments. The results show a clear increase in the proportion of frequent correspondence patterns and words exhibiting regular cognate relations.Comment: The paper was accepted at the SIGTYP workshop 2023 co-located with EAC

    Statistical proof of language relatedness (Open problems in computational diversity linguistics 7)

    Get PDF
    This is the eighth of a series of 12 blog posts published in 2019, discussing open problems in computational diversity linguistics. It discusses the problem of statistical proof of language relatedness

    Formal and quantitative approaches to historical language comparison

    Get PDF
    Lecture, given at the Fifth Pavia International Summer School for Indo-European Linguistics (Università di Pavia, 2022-09-05/09

    Save the trees

    Get PDF
    Skepticism regarding the tree model has a long tradition in historical linguistics. Although scholars have emphasized that the tree model and its long-standing counterpart, the wave theory, are not necessarily incompatible, the opinion that family trees are unrealistic and should be completely abandoned in the field of historical linguistics has always enjoyed a certain popularity. This skepticism has further increased with the advent of recently proposed techniques for data visualization which seem to confirm that we can study language history without trees. In this article, we show that the concrete arguments that have been brought up in favor of achronistic wave models do not hold. By comparing the phenomenon of incomplete lineage sorting in biology with processes in linguistics, we show that data which do not seem as though they can be explained using trees can indeed be explained without turning to diffusion as an explanation. At the same time, methodological limits in historical reconstruction might easily lead to an overestimation of regularity, which may in turn appear as conflicting patterns when the researcher is trying to reconstruct a coherent phylogeny. We illustrate how, in several instances, trees can benefit language comparison, although we also discuss their shortcomings in modeling mixed languages. While acknowledging that not all aspects of language history are tree-like, and that integrated models which capture both vertical and lateral language relations may depict language history more realistically than trees do, we conclude that all models claiming that vertical language relations can be completely ignored are essentially wrong: either they still tacitly draw upon family trees or they only provide a static display of data and thus fail to model temporal aspects of language history

    Sequence Comparison in Historical Linguistics

    Get PDF
    B

    Multiple sequence alignment in historical linguistics. A sound class based approach

    Get PDF
    In this paper, a new method for multiple sequence alignment in historical linguistics is presented. The algorithm is based on the traditional framework of progressive multiple sequence alignment (cf. Durbin et al. 2002:143-149) whose shortcomings are further enhanced by (1) a sound class representation of phonetic sequences (cf. Dolgopolsky 1986, Turchin et al. 2010) accompanied by specific scoring functions, (2) the modification of gap scores based on prosodic context, (3) a new method for the detection of swapped sites in already aligned sequences. The algorithm is implemented as part of the LingPy library (http://lingulist.de/lingpy), a suite of open source Python modules for various tasks in quantitative historical linguistics. The method was tested on a benchmark dataset of 152 manually edited multiple alignments covering data for 192 Bulgarian dialects (Prokić et al. 2009). The results show that the new method yields alignments which differ only in 5 % of all sequences from the gold standard

    Computational Historical Linguistics

    Get PDF
    In the course, I give a basic introduction into some of the recent developments in the field of computational historical linguistics. While this field is predominantly represented by phylogenetic approaches with whom scholars try to infer phylogenetic trees from different kinds of language data, the approach taken here is much broader, concentrating specifically on the prerequisites needed in order to get one’s data into the shape to carry out phylogenetic analyses. As a result, we will concentrate on topics such as automated phonetic alignments, automated cognate detection, the handling of semantic shift, and the modeling of word formation in comparative wordlists. A major goal of the course is to emphasize the importance of computer-assisted — as opposed to computer-based — approaches, which acknowledge the importance of qualitative work in historical language comparison. The course will be accompanied by code examples which participants can try to replicate on their computers
    corecore